45 research outputs found

    Analysis, interpretation and synthesis of facial expressions

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 121-130).by Irfan Aziz Essa.Ph.D

    Coding, Analysis, Interpretation, and Recognition of Facial Expressions

    Get PDF
    We describe a computer vision system for observing facial motion by using an optimal estimation optical flow method coupled with a geometric and a physical (muscle) model describing the facial structure. Our method produces a reliable parametric representation of the face's independent muscle action groups, as well as an accurate estimate of facial motion. Previous efforts at analysis of facial expression have been based on the Facial Action Coding System (FACS), a representation developed in order to allow human psychologists to code expression from static pictures. To avoid use of this heuristic coding scheme, we have used our computer vision system to probabilistically characterize facial motion and muscle activation in an experimental population, thus deriving a new, more accurate representation of human facial expressions that we call FACS+. We use this new representation for recognition in two different ways. The first method uses the physics-based model directly, by recognizing..

    Machine Learning for Video-Based Rendering

    Get PDF
    We recently introduced a new paradigm for computer animation, video textures, which allows us to use a recorded video to generate novel animations by replaying the video samples in a new order. Video sprites are a special type of video texture. Instead of storing whole images, the object of interest is separated from the background and the video samples are stored as a sequence of alpha-matted sprites with associated velocity information. They can be rendered anywhere on the screen to create a novel animation of the object. To create such an animation, we have to find a sequence of sprite samples that is both visually smooth and shows the desired motion. In this paper, we address both problems. To estimate visual smoothness, we train a linear classifier to estimate visual similarity between video samples. If the motion path is known in advance, we then use a beam search algorithm to find a good sample sequence. We can also specify the motion interactively by precomputing a set of cost functions using Q-learning

    Localization and 3D Reconstruction of Urban Scenes Using GPS

    Get PDF
    Using off-the-shelf Global Positioning System (GPS) units, we reconstruct buildings in 3D by exploiting the reduction in signal to noise ratio (SNR) that occurs when the buildings obstruct the line-of-sight between the moving units and the orbiting satellites. We measure the size and height of skyscrapers as well as automatically constructing a density map representing the location of multiple buildings in an urban landscape. If deployed on a large scale, via a cellular service provider’s GPS-enabled mobile phones or GPS-tracked delivery vehicles, the system could provide an inexpensive means of continuously creating and updating 3D maps of urban environments

    SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

    Full text link
    In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos. SPAE converts between raw pixels and interpretable lexical tokens (or words) extracted from the LLM's vocabulary. The resulting tokens capture both the semantic meaning and the fine-grained details needed for visual reconstruction, effectively translating the visual content into a language comprehensible to the LLM, and empowering it to perform a wide array of multimodal tasks. Our approach is validated through in-context learning experiments with frozen PaLM 2 and GPT 3.5 on a diverse set of image understanding and generation tasks. Our method marks the first successful attempt to enable a frozen LLM to generate image content while surpassing state-of-the-art performance in image understanding tasks, under the same setting, by over 25%.Comment: NeurIPS 2023 spotligh

    Career: developing and evaluating a spatio-temporal representation for analysis, modeling, recognition and synthesis of facial expressions

    Get PDF
    Issued as final reportNational Science Foundatio

    Contact detection, collision forces and friction for physically based virtual world modeling

    No full text
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Civil Engineering, 1990.Includes bibliographical references (leaves 142-145).by Irfan A. Essa.M.S

    Aware Home: Sensing, Interpretation, and Recognition of Everday Activities

    No full text
    Presentation given in the Wilby Room of the Georgia Tech Library and Information Center.The Aware Home project is a unique living laboratory for exploration of ubiquitous computing in a domestic setting. Dr Essa's Talk will present ongoing research in the area of developing technologies within a residential setting that will affect our everyday living - specifically concentrating on the sensing and perception technologies that can enable a home environment to be aware of the whereabouts and activities of its occupants. The discussion will include the use of computer vision, audition work and other efforts in computational perception to track and monitor the residents, as well as methods being developed to recognize the residents' activities over short and extended periods. The technological, design and engineering research challenges inherent in this problem domain, and the focus on awareness to help maintain independence and quality of life for an aging population will also be explored. The project is located in the Georgia Tech Broadband Institute's Residential Laboratory

    Coding, Analysis, Interpretation, and Recognition of Facial Expressions

    Get PDF
    We describe a computer vision system for observing facial motion by using an optimal estimation optical flow method coupled with geometric, physical and motion-based dynamic models describing the facial structure. Our method produces a reliable parametric representation of the face's independent muscle action groups, as well as an accurate estimate of facial motion. Previous efforts at analysis of facial expression have been based on the Facial Action Coding System (FACS), a representation developed in order to allow human psychologists to code expression from static pictures. To avoid use of this heuristic coding scheme, we have used our computer vision system to probabilistically characterize facial motion and muscle activation in an experimental population, thus deriving a new, more accurate representation of human facial expressions that we call FACS. Finally, we show how this method can be used for coding, analysis, interpretation, and recognition of facial expressions

    1 Introduction Computers Seeing People

    No full text
    corecore